Search Results containing a local pack often get the majority of clicks. Knowing which local rankings factors to optimize for the biggest bang is crucial for SEOs and business owners alike.
Smaller data studies as well as opinion-based surveys have sought to uncover the relevance and importance of local ranking factors, namely https://www.localseoguide.com/guides/local-seo-ranking-factors/, https://moz.com/local-search-ranking-factors, https://www.brightlocal.com/research/how-car-dealerships-are-speeding-ahead-with-google-my-business/
However, in our view, most of the studies are outdated and contain severe statistical and methodological flaws.
This study intends to fill the gap and shed some insights on which local ranking factors are the most important ones in the personal injury niche.
Step 1 Keyword Selection: We defined for 4 unique keyword combinations in 426 US cities with above 100k pop., giving in total 1659 search results. The format of the search queries was the following:
(city) + " car accident lawyer" (city) + " personal injury lawyer" (city) + " car accident attorney" (city) + " personal injury attorney"
Step 2 Data Mining:
| Type | Value |
|---|---|
| #Cities | 426 |
| #Keywords | 4 |
| Total # of searches | 1674 |
| #Unique place IDs | 12931 |
| Searches with less than 10 results | 0.25% |
| Searches with 10 to 15 results | 1.31% |
| Searches with 16 to 19 results | 5.44% |
| Searches with a full first-page | 93% |
Step 3 Data Enrichment:
Step 4 Data Analysis: The goal of the statistical modelling in this study is to find answers to three key questions:
The statistical method of choice for this study was the gradient boosted decision trees (GBDT) model. The GBDT is a widely used machine learning technique which can be used in many settings. These can range from regression and classification to learning to rank type of problems. In a learning to rank problem, there is a ordered list of items and the goal for the model is to calculate a score for each item based on the dependent variables such that the original order is retained.
In process of building the model, data set was split to two folds: train data (containing around 70% of searches) and test data (the rest of the data, about 30%). GBDT model was fitted using training data, predictions were calculated for the test data set, and then finally predictions were compared to real observed rankings. The chosen evaluation metric was Spearman’s rank correlation coefficient. Spearman’s rank correlation is a scaled measurement of the agreement of two rankings. Perfectly matching rankings would give value of 1, the expected value for random rankings is zero and reverse order would have value of -1.
The next step is to understand why the model makes particular predictions; what are the most important dependent variables and how their values effect the predictions? For this purpose SHapley Additive exPlanations (SHAP) values were calculated. In SHAP each prediction is presented as a sum of each dependent variable’s responsibility. Then the overall impact of any particular variable can be measured as a average of absolute values over the whole data set.
All variables with the prefix “relative” are calculated as rank (values) / length (values) for values inside each search group. For example, a search compromised of “Milwaukee car accident lawyer”, the entry with the highest number of photos would get the relative # of photos value equal to 1. The entry with the lowest value would get a 0, and the remaining values would be something in between. The motivation behind this transformation is to make attribute values more comparable between search results i.e. trying to minimize effect of the size of the population of the city.
In this section, we analyse how different ranking factors relate with higher organic positions in the Google My Business Search Results.
We look first at the importance of indidividual ranking factors. Then, we provide a deep dive into single variables that we´ve identified as particulary important.
The first plot is showing feature importance. That is each feature’s average contribution to model’s predictions. The second plot shows the direction of the impact given feature’s value. Overall, the most interesting features for us to look at are a) high in the first plot and b) show clear pattern in the second plot.
For example. if we look at the first row and the feature named “Has same city listed as in search query”, we can see a polarized distribution of SHAP values around zero. Yellow points correspond to high feature values (in this case, “No”) and here their impact to all predictions in the data set is negative and therefore making model belief that predicted positions should be worse for them. Where as purple points correspond to high feature values (“Yes”) and have positive impact for predicted positions.
Similarly, “Type category is personal injury” and “Type category is personal injury” are similar too the “Has same city listed as in search query”. Ie. if they have value “Yes” they will impact positively to predicted positions.
In the plot below, showing the distribution of correlations calculated separately for each search. Overall, the mean correlation is about 0.6, showing fairly good fit between observations and predictions.
The depended variables used in this study can be roughly organized into five main groups, these are listed below and also showing a few important variables suggested by SHAP values.
In terms of SEO, the first two categories are not much of a interest as they are something difficult or even impossible to change or adjust, but the last three are more interesting and worth further investigation.
| Type | Value |
|---|---|
| Total unique categories | 72 |
| Missing type category | 1.99% |
| Categories with more than >=10 results | 26 |
| Categories with more than >=100 results | 13 |
| Categories with more than >=1000 results | 3 |
| Median unique categories in one search | 4 |
| Min unique categories in one search | 1 |
| Max unique categories in one search | 12 |
Key takeaways:
| Type | Title | Description |
|---|---|---|
| Median character length (non missing) | 24 | 534 |
| Min character length (non missing) | 4 | 8 |
| Max character length (non missing) | 125 | 752 |
| Missing | 0.01% | 40.66% |
| Containing lawyer or attorney | 22.65% | 43.62% |
| Containing car accident or personal injury | 5.32% | 44.76% |
| Containing city name | 5% | 27.07% |
Key takeaways:
| Type | Value |
|---|---|
| Median #reviews | 14 |
| Max #reviews | 968 |
| No reviews available | 16.57% |
| Average rating | 4.61 |
| Response ratio by owners | 33.43% |
| Average number of likes per review | 0.66 |
Key takeaways:
| Type | ref_domains_dofollow | total_traffic | ahrefs_rank | domain_rating |
|---|---|---|---|---|
| Median | 40 | 82 | 17841948 | 10 |
| Min | 0 | 0 | 4281 | 0 |
| Max | 13379 | 3444072 | 171527697 | 85 |
| Missing | 0.36% | 0.36% | 0.43% | 0.43% |
Key takeaways:
| Type | Value |
|---|---|
| Median #photos | 6 |
| Max #photos | 540 |
| Zero #photos | 5.77% |
| Provides Google updates | 54.83% |
Key takeaways: